Hallucinating system outputs for discriminative language modeling

نویسندگان

Brian Roark

Arda Çelebi

Erinç Dikici

Sanjeev Khudanpur

Maider Lehr

Emily Tucker Prud'hommeaux

Kenji Sagae

Murat Saraclar

Izhak Shafran

Puyang Xu

چکیده

Project overview • NSF funded project and recent JHU summer workshop team • General topic: discriminative language modeling for ASR and MT – Learning language models with discriminative objectives • Specific topic: learning models from text only – Enabling use of much more training data; adaptation scenarios • Have made some progress with ASR models (topic today) – Less progress on improving MT (even fully supervised) • Talk includes a few other observations about DLM in general 1 Motivation

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phrasal Cohort Based Unsupervised Discriminative Language Modeling

Simulated confusions enable the use of large text-only corpora for discriminative language modeling by hallucinating the likely recognition outputs that each (correct) sentence would be confused with. In [1], a novel approach was introduced to simulate confusions using phrasal cohorts derived directly from recognition output. However, the described approach relied on transcribed speech to deriv...

متن کامل

Data Sampling and Dimensionality Reduction Approaches for Reranking ASR Outputs Using Discriminative Language Models

This paper investigates various approaches to data sampling and dimensionality reduction for discriminative language models (DLM). Being a feature based language modeling approach, the aim of DLM is to rerank the ASR output with discriminatively trained feature parameters. Using a Turkish morphology based feature set, we examine the use of online Principal Component Analysis (PCA) as a dimensio...

متن کامل

Investigation of MT-based ASR confusion models for semi-supervised discriminative language modeling

Semi-supervised discriminative language modeling uses simulated N-best lists instead of real ASR outputs as its training examples. In this study we apply two techniques in which artificial examples are generated using a WFST and an MT system trained on pairs of reference text and ASR output. We compare the performance of these techniques with the structured prediction and ranking variants of th...

متن کامل

Unsupervised training methods for discriminative language modeling

Discriminative language modeling (DLM) aims to choose the most accurate word sequence by reranking the alternatives output by the automatic speech recognizer (ASR). The conventional (supervised) way of training a DLM requires a large amount of acoustic recordings together with their manual reference transcriptions. These transcriptions are used to determine the target ranks of the ASR outputs, ...

متن کامل

Discriminative, Syntactic Language Modeling through Latent SVMs

We construct a discriminative, syntactic language model (LM) by using a latent support vector machine (SVM) to train an unlexicalized parser to judge sentences. That is, the parser is optimized so that correct sentences receive high-scoring trees, while incorrect sentences do not. Because of this alternative objective, the parser can be trained with only a part-of-speech dictionary and binary-l...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Hallucinating system outputs for discriminative language modeling

نویسندگان

چکیده

منابع مشابه

Phrasal Cohort Based Unsupervised Discriminative Language Modeling

Data Sampling and Dimensionality Reduction Approaches for Reranking ASR Outputs Using Discriminative Language Models

Investigation of MT-based ASR confusion models for semi-supervised discriminative language modeling

Unsupervised training methods for discriminative language modeling

Discriminative, Syntactic Language Modeling through Latent SVMs

عنوان ژورنال:

اشتراک گذاری